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THE  RELATIONSHIP  BETWEEN  MATERIAL  FAILURES  AND 
FLIGHT  HOURS:  STATISTICAL  CONSIDERATIONS 

INTRODUCTION 

;This  paper  attempts  to  clarify  the  relationships  among  the  following  four 
hypotheses:  (1)  The  number  of  material  failures  across  intervals  of  calendar 
time  containing  equal  accumulated  flight  hours  follows  a  Poisson  distribution; 

(2)  the  number  of  elapsed  flight  hours  between  successive  independent 
material  failures  follows  an  exponential  distribution;  (3)  the  expected  number 
of  monthly  material  failures  is  exactly  proportional  to  monthly  flight  hours; 
and  (4)  the  observed  number  of  monthly  material  failures  is  strongly  corre¬ 
lated  with  monthly  flight  hours. 

By  a  well-known  result,  hypotheses  (1)  and  (2)  are  equivalent.  By  a 
second  well-known  result,  hypotheses  (1)  and  (3)  are  also  equivalent.  How¬ 
ever,  hypotheses  (1)  and  (4)  are  not  equivalent.  That  is,  while  the  expected 
number  of  failures  is  exactly  proportional  to  flight  hours  under  a  Poisson  dis¬ 
tribution,  this  relationship  is  not  revealed  by  a  linear  regression  between 
failures  and  flight  hours.  For  example,  we  demonstrate  that  if  the  mean  and 
variance  of  flight  hours  across  months  are  equal  and  if  the  failure  rate  per 
flight  hour  equals  .01,  then  the  correlation  between  failures  and  flight  hours 
will  equal  only  .10.  Moreover,  the  squared  correlation,  coreg.spon^ng  to  the 
regression  R-squared  statistic,  will  equal  only  .01.  - 

r  '  'j  A' -7  \  .  Q'  — 

To  test  tor  a  relationship  between  flight  h(5urs  and  the  expected^number 
of  failures,  linear  regression  is  not  the  appropriate  tool.  Instead,  we  must  test 
the  underlying  h}rpothesis  that  the  data  follow  a  Poisson  distribution.  If  the 
Poisson  distribution  fits  the  data,  then  the  expected  number  of  failures  will  be 
exactly  proportional  to  flight  hours  because  hypothesis  (1)  implies  hypothesis 

(3)  above. 

We  present  three  goodness-of-fit  tests  for  a  Poisson  distribution.  It  is 
sometimes  asserted  that,  despite  its  mathematical  tractability,  the  Poisson 
distribution  does  not  fit  any  real-world  data.  Contrary  to  this  assertion,  we 
show  that  the  Poisson  distribution  is  perfectly  adequate  to  describe  data  on 
Navy  A-7  accidents  over  the  period  CY  1977  —  CY  1983. 

Use  of  the  Poisson  distribution  imposes  the  restriction  that  the  mean  and 
variance  of  the  data  are  equal.  In  many  situations  the  negative  binomial  dis¬ 
tribution  may  provide  a  superior  fit  to  the  data,  because  the  negative  binomial 
distribution  allows  the  variance  to  exceed  the  mean.  Moreover,  the  negative 
binomial  distribution  may  be  derived  from  the  Poisson  distribution  by 
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assuming  that  the  mean  of  the  Poisson  distribution  is  itself  randomly 
distributed  across  the  observational  units  according  to  a  Gamma  distribution. 


Although  the  negative  binomial  distribution  is  more  flexible  than  the 
Poisson  distribution,  it  is  not  completely  satisfactory  because  it  "explains”  the 
excess  variation  in  the  data  by  simply  adding  an  additional  source  of 
randomness.  A  more  useful  approach  may  be  to  explain  the  data  by  intro¬ 
ducing  observable  variables  thought  to  Influence  the  failure  rate,  such  as  crew 
manning  and  experience  levels.  The  correct  technique  for  estimating  the 
influence  of  the  variables  is  not  linear  regression,  because  linear  regression 
does  not  even  reveal  the  proportionality  between  flight  hours  and  the  expected 
number  of  failures.  A  better  approach  is  to  express  the  failure  rate  (rather 
than  the  number  of  failures)  as  a  function  of  explanatory  variables  using  a 
maximum  likelihood  technique  known  as  Poisson  regression.  We  present  an 
exposition  of  Poisson  regression  and  also  some  recently  developed  generaliza¬ 
tions  of  that  technique. 

STATISTICAL  PRELIMINARIES 

The  Poisson  distribution  is  frequently  used  to  describe  the  number  of 
occurrences  of  an  event  in  a  fixed-length  interval  of  time.  Let  g{X,t)  denote 
the  probability  of  X  occurrences  in  an  interval  of  length  t.  Then  the  Poisson 
distribution  is  given  by: 


(1) 


for  X  =  0, 1,  2, 3...  where  X  >  0  is  the  instantaneous  failure  rate.  The  Poisson 
distribution  has  the  property  EiX;  \)  =  VatiX;  K)  =  so  that  the  mean  and 
variance  are  constrained  to  equality. 

There  are  several  alternative  characterizations  of  the  Poisson  dis¬ 
tribution.  Let  oit)  be  any  function  with  the  property  oit)/t  -*  0  as  t  -»  0.  Then 


by  a  well-known  result/the  following  comprise  a  set  of  sufficient  conditions 
for  the  Poisson  distribution: 


1.  g(l,  t)  =  \t  +  o(t)  for  all  t  >  0  and  some  \  >  0 

2.  S  g(X,t)  =  o(t) 
x  =  2 

3.  The  numbers  of  occurrences  in  any  set  of  non-overlapping  intervals 
are  statistically  independent. 

The  first  condition  states  that  the  probability  of  exactly  one  occurrence  is 
proportional  to  the  length  of  the  interval  for  arbitrarily  small  intervals.  The 
second  condition  states  that  the  probability  of  multiple  occurrences  ap¬ 
proaches  zero  as  the  length  of  the  interval  approaches  zero. 

Finally,  by  another  well-known  result,^  the  number  of  occurrences  in  a 
fixed-length  interval  of  time  has  a  Poisson  distribution  if  and  only  if  the 
elapsed  time  between  occurrences  has  an  exponential  distribution: 


f(t)  =  \  exp  (-\t)  . 


(2) 


CORRELATION  BETWEEN  X  AND  t 

As  we  stated  earlier,  a  property  of  the  Poisson  distribution  is  E(X;  X)  = 
\t,  so  that  the  expected  number  of  occurrences  is  proportional  to  the  length  of 
the  interval.  Suppose  we  have  monthly  data  on  material  failures  (X)  and 
flight  hours  (t).  If  we  believe  that  the  observations  follow  a  Poisson  distribu¬ 
tion,  then  in  view  of  the  proportionality  of  the  mean  we  might  expect  a  high 
correlation  coefficient  between  X  and  f.  Perhaps  surprisingly,  this  is  not  the 
case. 


1.  See,  for  example,  Hogg  and  Craig  [1 1,  pages  94-96, 

2.  See,  for  example,  Hogg  and  CraiglH,  pages  100-101. 


Note  that  the  density  of  t  across  months  is  arbitrary;  we  require  only 
that  it  have  finite  mean  and  variance.  Let  denote  the  expectation  over  the 
density  of  t.  Following  the  analysis  of  Brown  and  Rogers*  [2]: 

E{X)  =E^E{X\t) 

=  E^iXt) 

=  XEit). 

E(Xt)  =E^E{Xt\t) 

=  E^[tE(X\t)] 

=  XEit^). 

E{X^)  =E^E{X^\t) 

=  E^[VaH.X\t)  +  E:^{X\t)] 

=  XE{t)  +  X^Eit'^). 

Combining  these  results: 

V'aKX)  -E(X^)-EHX) 

-XE(t)  +  XHE{t'^)-E^U)] 

=  XEit)  +  Var(t). 

Cov(X,t)  =  EiXt)  -  E{X)E{t) 

=  XE{t^)-XEHt) 

=  XVar{t). 

It  follows  that: 

Correlation  (X,i)  =  [CovHX,t)/VatiX)Var{t)V'^ 

=  [1  +  EitVXVaiit)]  .  (3) 

The  correlation  will  be  small  as  long  as  A  is  small  and  Var{t)/E(t)  is  not 
too  large.  For  example,  suppose  Eit)  =  Varit)  and  A  =  .01.  Then  the  corre¬ 
lation  will  equal  .10.  Moreover,  the  squared  correlation,  corresponding  to  the 
regression  R-squared  statistic,  will  equal  .01.  Hence  even  if  the  observations 

1.  Brown  and  Rogers  actually  derive  a  generalization  of  what  follows  in  the  case  of  a 
negative  binomial  distribution  rather  than  a  Poisson  distribution. 
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follow  a  Poisson  distribution  exactly,  we  would  not  expect  a  linear  regression 
between  X  and  t  to  reveal  any  apparent  relationship.  The  relationship  be¬ 
tween  E{X)  and  t  is  exactly  proportional,  but  this  relationship  is  not  revealed 
in  the  simple  scatter  plot  between  X  and  t.  Moreover,  even  if  VaiitVEit)  is 
large  so  that  the  correlation  is  high,  a  regression  equation  will  still  not 
provide  accurate  forecasts  of  future  material  failures  because  future  flight 
hours  are  highly  uncertain  and  unpredictable  in  this  case. 

We  may  also  examine  the  regression  coefficient  rather  than  the 
correlation  coefficient.  A  linear  regression  of  material  failures  on  flight  hours 
will  be  unbiased: 


E(X)  =  Cov(X,  t)/Varit)  =  A  . 

However,  the  variance  of  the  regression  coefficient  will  be  quite  large  so  that 
the  coefficient,  although  unbiased,  will  not  be  precisely  estimated.  To  com¬ 
pute  the  variance  of  the  regression  coefficient,  recall  the  following  relation¬ 
ship  between  the  R-squared  statistic,  t-statistic,  and  F-statistic  from  elemen¬ 
tary  regression  theory:^ 


(A)^/VaKA)  =  =  F  =  (N-2)R^ ! 

where  N  is  the  sample  size.  Using  equation  (3),  we  find  that  Varik)  reduces  to: 

Var  (X)  =  [\E  (0  ]  /  [  {N-2)  Var  (t)  ]  . 

To  illustrate  the  magnitude  of  Varik)  again  suppose  that  Eit)  =  Varit)  and 
k  =  .01.  Then  VaKA)  reduces  to  .01/(iV-2).  Hence  to  achieve  a  ”t-ratio”  of  2.0, 
implying  statistical  significance,  the  sample  size  would  have  to  be  at  least 
402.  Expressed  differently,  confidence  intervals  for  k  will  be  extremely  wide 
even  for  sample  sizes  of  several  hundred.  Like  correlation  analysis,  regression 
analysis^  fails  to  reveal  the  proportionality  between  EiX)  and  t. 


1.  See,  for  example,  Johnston  [3],  pages  35-38. 

2.  The  regression  may  also  be  computed  without  an  intercept,  to  reflect  the  strict  pro¬ 
portionality  between  EiX)  and  t.  The  regression  slope  will  still  be  unbiased,  the  only 
difference  being  that  (N-l)  replaces  (N-2)  as  the  degrees-of-freedom  in  the  formulae  derived 
in  the  text. 
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To  test  for  a  relationship  between  E(X)  and  t,  we  must  test  the 
underlying  h}rpothesis  that  the  observations  follow  a  Poisson  distribution. 
Examples  of  such  tests  are  given  in  the  next  section. 

TESTS  FOR  A  POISSON  DISTRIBUTION 

It  is  sometimes  asserted  that,  despite  its  mathematical  tractability,  the 
Poisson  distribution  does  not  fit  any  real-world  data.  In  this  section,  we  will 
discuss  several  tests  for  a  Poisson  distribution  and  provide  a  counterexample 
to  the  above  assertion. 

As  we  stated  earlier,  a  property  of  the  Poisson  distribution  is  EiX;  X)  = 
VariX;  X)  =  Xt.  This  property  provides  a  test  of  the  Poisson  distribution.  If 
the  mean  and  variance  are  equal,  then  we  accept  the  Poisson  distribution;  but 
if  the  variance  exceeds  the  mean,  we  must  consider  an  alternative  such  as  the 
negative  binomial  distribution. 

We  apply  this  test  to  data  on  Class  A  accidents^  involving  Navy  A-7 
aircraft  over  the  period  CY  1977  — CY  1983.  Figure  1  plots  the  monthly 
number  of  Class  A  accidents  against  the  monthly  number  of  A-7  flight  hours. 
Although  there  does  not  appear  to  be  any  visual  relationship,  the  simple 
correlation  across  these  84  monthly  data  points  is  actually  negative,^  equal  to 
-.245.  However,  as  we  saw  in  the  previous  section,  this  perverse  correlation 
does  not  invalidate  the  assumption  of  a  Poisson  distibution  or  the  resulting 
positive  proportionality  between  flight  hours  and  the  expected  number  of 
accidents. 

To  test  for  equality  between  the  mean  and  variance,  we  tabulated  the 
number  of  monthly  accidents  and  then  calculated  the  sample  mean  and 
sample  variance  across  the  12  months  of  each  respective  year.  If  the  data 
follow  a  Poisson  distribution,  then  the  mean  in  each  year  should  equal  the 
variance  in  that  year.  The  mean  and  variance  may  differ  from  one  year  to 
another,  but  they  should  move  in  proportion  to  each  other  and  to  annual  flight 
hours.  Hence,  if  we  plot  the  mean  and  variance  in  each  year,  the  seven  annual 
data  points  should  lie  along  a  45-degree  line  through  the  origin. 


1.  Class  A  accidents  are  those  which  result  in  either  a  fatality,  complete  destruction  of  an 
aircraft,  or  at  least  $500,000  damage. 

2.  If  the  distribution  is  Poisson,  then  equation  (3)  is  applicable.  Using  our  sample 
estimates  of  E(t),  Var(t),  and  A,  equation  (3)  predicts  a  correlation  of  +  .141.  We  have  not 
tested  whether  this  discrepancy  could  reasonably  be  due  to  chance. 
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FIGURE  1:  MONTHLY  FLIGHT  HOURS  AND  ACCIDENTS 


Figure  2  presents  the  plot  of  the  mean  and  variance.  Two  of  the  points 
lie  exactly  along  the  45-degree  line.  Four  points  lie  slightly  below  the  line, 
and  one  outlier  lies  well  above  the  line.  The  outlier  is  due  to  the  five  accidents 
that  occurred  during  a  single  month  of  that  year.  Overall,  the  data  do  not 
depart  radically  from  the  pattern  expected  under  a  Poisson  distribution. 

As  a  second  test,  we  compared  the  observed  frequency  distribution  of 
accidents  per  period  to  the  best-fitting  theoretical  Poisson  distribution.  We 
could  not  make  this  comparison  directly  using  the  monthly  data  because  flight 
hours  and  hence  the  Poisson  parameter  vary  across  months,  so  that  no  single 
Poisson  distribution  could  possibly  fit  the  entire  data  set.  Instead,  to  stabilize 
the  Poisson  parameter  we  divided  our  sample  into  81  intervals  of  calendar 
time  each  containing  15,000  accumulated  flight  hours. ^  We  estimated  the 
Poisson  parameter  for  an  interval  of  this  length  as  15,000  times  total 
accidents  divided  by  total  flight  hours,  yielding  the  value  1.372.  This 
parameter  is  quite  precisely  estimated,  with  a  standard  error  of  only  0.130. 
The  observed  distribution  and  the  theoretical  distribution  with  parameter 
1.372  are  both  presented  in  table  1. 


TABLE  1 


OBSERVED  AND  THEORETICAL  DISTRIBUTION 
OF  ACCIDENTS® 


Number  of  accidents 

Observed  freouencv 

Theoretical  freouencv 

0 

23 

20.5 

1 

28 

28.2 

2 

16 

19.3 

3 

8 

8.8 

4 

4 

3.0 

5 

1 

0.8 

6 

1 

0.2 

a.  The  time  interval  is  15,000  flight  hours,  and  the  corresponding  Poisson 
parameter  is  1.372. 


1  Actually,  the  final  interval  contained  only  13,495  flight  hours  because  the  total  number 
of  flight  hours  was  not  evenly  divisible  by  15,000. 
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We  performe'*  a  chi-squared  goodness-of-fit  test  to  compare  the  observed 
and  theoretical  frequencies.  We  combined  the  cells  corresponding  to  4,  5,  and 
6  accidents  in  order  to  achieve  a  frequency  of  at  least  4  in  every  cell.  We 
computed  a  chi-squared  statistic  of  1.943  with  3  degrees-of-freedom.  This 
value  is  much  less  than  the  10  percent  significance  point  of  6.251.  Hence  we 
cannot  reject  the  hypothesis  of  a  Poisson  distribution. 

As  a  final  test,  we  exploited  the  fact  that  the  number  of  accidents  per 
time  period  has  a  Poisson  distribution  if  and  only  if  the  elapsed  flight  hours 
between  successive  accidents  have  an  exponential  distribution.  We  first 
computed  the  elapsed  flight  hours  between  successive  accidents  and  placed 
them  in  ascending  order  to  obtain  the  sample  order  statistics,  which  we  denote 
If  the  time  between  successive  accidents,  T,  is  exponentially  dis¬ 
tributed,  then  the  cumulative  distribution  function  at  T,  Y  =  F{T)  — 
l-exp{-\t),  is  uniformly  distributed  in  the  interval  [0,1].  The  ith  order 
statistic  in  a  sample  of  size  n  from  a  uniform  distribution  has  expected  value 
il(n  + 1).  It  follows  that; 

£[1  -  exp  (-XOl  =  i/{n  +  1)  . 

If  we  ignore  the  expectation  operator  and  solve  for  t.,  we  obtain: 

f.  =  -(l/\)  tog  [l-i/(n  +  1)1  .  (4) 

Equation  (4)  suggests  plotting  the  iUi  order  statistic,  t-,  against 
-/og[l-t(n-f  1)].  If  the  distribution  is  exponential,  then  all  n  points  should  fall 
along  a  straight  line  with  slope  l/\. 

Figure  3  contains  the  plot  described  above.  While  we  do  not  perform  a 
formal  statistical  test,  the  points  indeed  seem  to  fall  along  a  straight  line. 
Moreover,  if  we  fit  a  least-squares  regression  line  through  these  points  and 
constrain  it  to  pass  through  the  origin,  we  obtain  a  slope  coefficient  of 
11,570.8.  Taking  the  reciprocal  of  the  slope  coefficient  and  multiplying  by 
15,000,  we  obtain  an  estimate  of  1.296  as  the  expected  number  of  accidents  in 
an  interval  containing  15,000  flight  hours.  This  estimate  is  quite  close  to  our 
earlier  estimate  of  1.372. 

Since  the  Poisson  distribution  fits  the  data  according  to  all  three 
goodness-of-flt  tests,  we  conclude  that  the  expected  number  of  accidents  is 
exactly  proportional  to  flight  hours.  This  is  true  despite  the  negative  simple 
correlation  between  monthly  accidents  and  monthly  flight  hours  reported 
earlier. 


NEGATIVE  BINOMIAL  MODEL 


Although  we  have  provided  an  example  of  real-world  data  that  follow  a 
Poisson  distribution,  other  situations  may  require  a  more  flexible 
distribution.  In  particular,  the  Poisson  distribution  constrains  the  mean  and 
variance  to  equality.  By  contrast,  the  negative  binomial  distribution  allows 
the  variance  to  exceed  the  mean. 

An  interesting  derivation  of  the  negative  binomial  distribution  was  first 
given  in  a  classic  paper  by  Greenwood  and  Yule  [4].  Their  paper  dealt  with 
the  distribution  of  industrial  accidents  among  workers.  They  assumed  that 
the  number  of  accidents  for  a  given  workman  in  a  fixed-length  interval  of  time 
followed  a  Poisson  distribution.  However,  they  found  that  different  workmen 
had  different  degrees  of ’’accident-proneness,”  hence  different  means  for  their 
respective  Poisson  distributions.  They  assumed  that  the  Poisson  mean  had  a 
Gamma  distribution  across  workmen: 

/i(X)  =  [p“/G(a)lX““*  etp(- p\) 

where  G  is  the  gamma  function.  Note  that  the  gamma  distribution  has  mean 
o/p  and  variance  The  distribution  of  accidents  across  workmen  is  given 
by: 

f  ^ 

g(X,t,\)h(k)d\ 

Jo 

where  g(X,  t;  X)  is  the  Poisson  distribution  in  equation  (1).  After  repeated 
integration  by  parts,  we  find  that  the  distribution  is  equal  to: 


("?■)(  ft:)  (?t7 


for  X  =  0, 1, 2, 3,... 


Equation  (6)  is  the  negative  binomial  distribution.  Its  mean  is  given  by: 

EiX)  =  E^E(XIA) 

= 

=  at/p  (7) 


(7) 


To  compute  the  variance,  observe  that: 


E{X^)  -  E^E{X^\\) 

=  E^[VariX\K) EHX\X)] 

=  af(P  +  f  +  af)/p^ 

It  follows  that: 

VariX)  =  Em-E^iX) 

=  af(P  +  f)/P2  (8) 

Brown  and  Rogers  [2]  have  applied  the  same  line  of  reasoning  to  an 
analysis  of  material  failures.  Moreover,  they  derived  an  upper  bound  on  the 
correlation  between  X  and  t  that  generalizes  the  expression  for  the  Poisson 
distribution  that  we  derived  earlier  in  equation  (3).  Their  bound  is: 

Correlation  {X,t)  <  [1  +  ^E{t)  /  a  VaKi)]  (9) 

Equation  (9)  is  quite  analogous  to  equation  (3)  because  o/p,  the  mean  of  the 
prior  distribution  of  A,  replaces  the  single  value  of  A  appearing  in  equation  (3), 
However,  equation  (9)  is  an  inequality  while  equation  (3)  is  an  equality. 
Hence  the  correlation  may  be  even  smaller  under  a  negative  binomial  dis¬ 
tribution  than  under  a  Poisson  distribution. 

We  fit  a  negative  binomial  distribution  to  the  data  on  Class  A  Navy  A-7 
accidents  reported  earlier  in  table  1.  Using  the  method-of-moments,  we 
equated  our  sample  mean  of  1.372  to  the  population  mean  given  by  equation 
(7),  and  our  sample  variance  of  1.711  to  the  population  variance  given  by 
equation  (8).  We  then  solved  for  the  values  a  =  5.492  and  p  =  60,154.  These 
values  imply  that  the  prior  distribution  of  A  has  a  mean  of  .0000913  and  a 
variance  that  vanishes  to  8  decimal  places.  The  negligible  variance  of  the 
prior  distribution  implies  that  the  simple  Poisson  distribution  with  a  fixed 
value  of  A  seems  perfectly  adequate  to  describe  the  data. 

POISSON  REGRESSION  MODEL 

The  negative  binomial  model  provides  a  flexible  tool  for  situations  in 
which  the  variance  of  the  data  exceeds  the  mean.  However,  a  drawback  of  this 
model  is  that  it  "explains”  the  excess  variation  in  the  data  by  simply  adding 


an  additional  source  of  randomness.  A  more  satisfying  approach  may  be  to 
explain  the  data  by  introducing  observable  variables  thought  to  influence  the 
propensity  to  fail,  such  as  crew  manning  and  experience  levels.  The  correct 
technique  for  estimating  the  influence  of  these  variables  is  not  a  linear 
regression  because,  as  we  have  seen,  linear  regression  does  not  even  reveal 
the  proportionality  between  flight  hours  and  the  expected  number  of  failures. 
A  better  approach  is  to  express  the  failure  rate  (rather  than  the  number  of 
failures)  as  a  function  of  explanatory  variables,  using  the  technique  of  Poisson 
regression. 

To  insure  non-negativity,  the  failure  rate  in  the  Poisson  regression 
model  is  written  as  an  exponential  function  of  the  vector  Z.  of  explanatory 
variables: 

=  etp(Z.p)  (10) 

where  P  is  an  unknown  but  estimable  vector  of  coefficients.  The  likelihood 
function  is  obtained  by  substituting  equation  (10)  into  equation  (1),  and 
forming  the  product  over  all  observations  in  the  sample.  The  vector  P  is 
chosen  to  maximize  the  likelihood  function.  It  is  easy  to  show  that  the  log- 
likelihood  function  is  globally  concave  as  long  as  the  data  matrix  (the  matrix 
with  rows  Z-)  is  of  full  rank.  Hence  the  maximum  likelihood  estimate  of  P  is 
readily  obtained  usingstandard  algorithms  such  as  Newton’s  method. 

The  introduction  of  explanatory  variables  will  help  to  eliminate  a 
portion  of  the  excess  variation  in  the  data.  However,  even  after  controlling  for 
the  explanatory  variables,  the  variance  of  the  data  may  still  exceed  the  mean. 
To  deal  with  this  situation,  Hausman,  Hall,  and  Griliches  [5]  generalized  the 
Poisson  regression  model  to  allow  random  variation  in  the  failure  rates 
beyond  the  systematic  variation  induced  by  the  variation  in  the  explanatory 
variables. 

Hausman  et  al.  express  the  failure  rate  as: 

X.  =  c/.exp  (Z.p)  =  exp(Z^P  +  u^)  (H) 

where  o,  =  exp(u-).  They  assume  that  v.  has  a  Gamma  distribution  across  the 
observational  units.  Further,  they  set  the  two  parameters  of  the  Gamma 
distribution  equal,  so  that  the  distribution  has  mean  1  and  variance  1/a  where 
a  is  the  common  value  of  the  two  parameters.  This  normalization  involves  no 
loss  of  generality  as  long  as  the  data  matrix  contains  a  column  of  I’s 
corresponding  to  an  intercept. 


After  repeated  integration  by  parts,  we  find  that  the  distribution  of 
failures  is  equal  to; 


( <.«-A  l'  '1*/'  °  ]- 

\  ^  J\a  +  texp(Z.Q)/  \  a  +  texp  (Z  Q)  / 


(12) 


for  X  =  0, 1,  2,  3...  Equation  (12)  is  a  negative  binomial  distribution,  but  its 
moments  are  different  from  those  of  the  negative  binomial  distribution 
derived  ealier  in  equation  (6).  The  mean  is  given  by: 

EiX)  =  E^E(X\v) 

=  EJ.utexp{Z^)] 

=  texpiZ^). 

To  compute  the  variance,  observe  that: 

E{X^)  =  E^EiX^\v) 

=  EJiVar{X\v)  +  E^Xlu)] 

=  texpiZ^)  +  i2[exp(Zp)]^(l  +  a)/a. 

It  follows  that: 


VaiiX)  =E{X^)-EHX) 

=  fexp(ZP)  +  f{exp{Z^y^ !  a. 

The  parameters  a  and  P  may  again  be  estimated  using  the  method  of 
maximum  likelihood.  The  estimate  of  p  obtained  from  the  Poisson  regression 
model  provides  a  convenient  starting  value  for  the  iteration.  However,  the 
log-likelihood  function  is  no  longer  globally  concave,  so  more  care  must  be 
exercised  in  searching  for  the  global  maximum. 


CONCLUSIONS 

If  material  failures  follow  a  Poisson  distribution,  then  the  expected 
number  of  failures  is  exactly  proportional  to  flight  hours.  However,  this  pro¬ 
portionality  will  not  be  revealed  by  simple  correlation  or  regression  analysis 
between  monthly  flight  hours  and  the  number  of  monthly  failures.  To  test  for 
proportionality,  we  must  test  the  underlying  hypothesis  that  the  data  follow  a 
Poisson  distribution. 


We  have  presented  three  goodness-of-fit  tests  for  a  Poisson  distribution. 
We  have  shown  that  the  Poisson  distribution  is  perfectly  adequate  to  describe 
data  on  Class  A  Navy  A-7  accidents  over  the  period  CY  1977  — CY  1983. 

In  cases  where  the  Poisson  distribution  does  not  fit  the  data,  we  present 
several  alternative  models.  First,  the  mean  of  the  Poisson  distribution  may 
itself  be  randomly  distributed  across  observational  units  according  to  a 
Gamma  distribution.  If  so,  the  number  of  failures  will  have  a  negative  bi¬ 
nomial  distribution.  Second,  the  mean  of  the  Poisson  distribution  may  depend 
systematically  upon  a  set  of  observable  explanatory  variables.  In  this  case, 
the  Poisson  regression  model  is  appropriate.  Finally,  the  mean  of  the  Poisson 
distribution  may  contain  both  a  systematic  component  that  depends  upon 
observable  variables  and  a  random  component  that  follows  a  Gamma  dis¬ 
tribution.  This  situation  yields  a  generalized  Poisson  regression  model. 
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